Concurrency & Parallelism in Backend Systems
Why Backend Systems Need Concurrency
-
Every backend system must handle multiple requests simultaneously
-
If a server handles only one request at a time:
- Other users must wait
- Leads to poor performance or failures
-
Concurrency helps:
- Utilize system resources efficiently
- Handle thousands of users concurrently
Typical Request Lifecycle
-
User → Server → Database → Response
-
Key observation:
- Server spends significant time waiting for external systems (DB, APIs)
Network Latency Examples
- Local DB: ~1–2 ms
- Same region: ~20–30 ms
- Different region: ~90–100 ms
The Core Problem: Idle CPU
-
While waiting for DB response:
- CPU does nothing
-
Modern CPU capability:
- ~3 billion instructions/sec (~3 million per ms)
-
Example:
- 100 ms wait → 300 million instructions wasted
IO vs CPU Work
IO-Bound Work
-
Waiting for:
- Database
- External APIs
- File system
-
Takes ~70–95% of time in backend systems
CPU-Bound Work
-
Actual computation:
- Validation
- JSON parsing
- Encryption
- Image processing
Key Insight
-
Typical API call:
- ~250 ms IO waiting
- ~10 ms CPU work
-
Result:
- 95% resource underutilization without concurrency
What is Concurrency?
-
Ability to handle multiple tasks at once (logically)
-
CPU switches between tasks:
- Start → Pause → Resume
Key Idea
-
While one task waits (IO):
- CPU works on another task
What is Parallelism?
- Ability to execute multiple tasks simultaneously (physically)
Requirement
- Multiple CPU cores
Concurrency vs Parallelism
Concurrency
- Single CPU core
- Tasks interleave execution
- Improves resource utilization
Parallelism
- Multiple CPU cores
- Tasks run at same time
- Improves execution speed
Simple Analogy
-
Concurrency:
- One chef cooking multiple dishes (switching tasks)
-
Parallelism:
- Multiple chefs cooking simultaneously
Timeline Understanding (Conceptual)
-
Request A starts → uses CPU → waits (DB)
-
CPU switches to Request B
-
When A’s response returns:
- CPU resumes A later
Key Point
-
At any moment:
- Only one task runs (single core)
- But multiple tasks are in progress
Why This Matters
-
Backend systems are mostly IO-bound
-
Without concurrency:
- CPU stays idle most of the time
-
With concurrency:
- CPU is always utilized
When to Use What
Use Concurrency (Most Cases)
-
IO-heavy workloads:
- DB queries
- API calls
- File operations
Use Parallelism
-
CPU-heavy workloads:
- Image processing
- Encryption
- Video encoding
Real-World Backend Behavior
-
Server handles:
- HTTP requests
- Logging
- Background jobs
- Telemetry
-
All compete for CPU time
-
Concurrency ensures:
- Efficient scheduling across all tasks
How Concurrency is Implemented
Two main mechanisms:
1. Threads
-
OS-level execution units
-
Each thread:
- Has its own stack
- Has instruction pointer
-
Managed by OS scheduler
Thread Scheduling
-
OS assigns time slices (e.g., 2 ms)
-
After time slice:
- Thread pauses
- Another thread runs
Preemptive Scheduling
- Threads are stopped automatically by OS
- Ensures fairness across tasks
Blocking Behavior
-
When thread hits IO:
- Marked as blocked
-
OS switches to another thread
-
Once IO completes:
- Thread becomes runnable again
Memory Model of Threads
Within Same Process
-
Threads share:
- Heap memory
- Global variables
Between Processes
- No shared memory (isolated)
Communication Between Threads
-
Done via shared memory
-
Advantages:
- Fast (no serialization)
-
Risks:
- Race conditions
- Data corruption
Parallelism with Threads
-
If multiple CPU cores:
- Multiple threads run truly in parallel
-
Improves:
- CPU-bound performance
Cost of Threads
1. Memory Overhead
-
Each thread:
- Stack ~KBs to MBs
-
Example:
- 10,000 threads → several GB memory
2. Creation Overhead
-
Creating thread involves:
- System call
- Stack allocation
- Scheduler registration
-
Takes:
- Microseconds to milliseconds
Key Takeaways
1. Backend Bottleneck
- Mostly IO-bound, not CPU-bound
2. Concurrency is Essential
- Prevents CPU idle time
- Enables handling many users
3. Parallelism is Situational
- Useful for heavy computation tasks
4. Threads are Powerful but Expensive
- High memory + creation cost
- Need careful management
Mental Model to Remember
-
CPU is valuable → never keep it idle
-
While waiting → do other work
-
Structure program to:
- Pause IO tasks
- Resume later
- Keep CPU busy
If you want next, I can:
- Explain event loop (Node.js style) vs threads (very important for interviews)
- Or give code-level intuition (C++ / Go / Node) so it clicks practically